Optical character recognition (OCR) refers to a process of converting an image depicting human-readable text into a document containing computer-readable text. The computer-readable text may be stored using an appropriate character-encoding scheme such as UTF-8 (UCS Transformation Format-8 Bit) for storing Unicode characters, ASCII (American Standard Code for Information Interchange) for storing English-language letters, numbers, and common punctuation, or using another appropriate encoding scheme.
OCR processing is often used in conjunction with physical document scanning to capture the text included in physical documents such as receipts, invoices, bank statements, business cards, resumes, and other types of documents. OCR may also be applied directly to electronic images, e.g., TIFF (Tagged Image File Format) images or other appropriate electronic images, to extract the depicted text into a computer-readable format.